Search results for "distribution:Lingua-Treebank Lingua Treebank"
Lingua::Treebank - Perl extension for manipulating the Penn Treebank format
This class knows how to read two treebank formats, the Penn format and the Chomsky Normal Form (CNF) format. These formats differ in how they handle terminal nodes. The Penn format places pre-terminal part of speech tags in the left-hand position of ...
KAHN/Lingua-Treebank-0.16 - 28 Aug 2008 20:08:52 UTC
Lingua::Treebank::Const - Object modeling constituent from a treebank
Module for describing simple constituents of the Penn Treebank. Recursive behaviors are implied. Note assumption that terminal nodes (those with defined "word" values) will not have "children", and vice versa. This assumption is currently unchecked b...
KAHN/Lingua-Treebank-0.16 - 28 Aug 2008 20:08:52 UTC
Lingua::Treebank::HeadFinder - Head-finding in Lingua::Treebank
The L::TB::HeadFinder object is initialized from a list like the one in To do...
KAHN/Lingua-Treebank-0.16 - 28 Aug 2008 20:08:52 UTC
get_words - given collapsed treebank, print words only
Reads input files (or STDIN) for Penn-style trees, one per line, and prints out only the words, one tree per line. Providing the "-sgml" tag makes the output pseudo-SGML by including angle-bracketed "<s>" and "</s>" tokens at the beginning and end of...
KAHN/Lingua-Treebank-0.16 - 28 Aug 2008 20:08:52 UTC
list-edges - reads penn treebanks, prints out all edges found in each tree, one tree per line
This program lists all edges in the trees presented, one tree per line. Edges are LABEL,INDEX,INDEX where INDEX values come from between the words (0-based). CAVEATS The trees must be in Penn treebank format. TO DO None that I know of....
KAHN/Lingua-Treebank-0.16 - 28 Aug 2008 20:08:52 UTC
vocabulary - extract vocabularies from Penn treebank files
Given a list of Penn treebank files, this script extracts the words, parts of speech, and non-terminal node names and emits each in a separate file in order of frequency. Note that giving a "-" argument for any of ntfile, posfile, or wordfile causes ...
KAHN/Lingua-Treebank-0.16 - 28 Aug 2008 20:08:52 UTC
tree-inflate - transform a one-tree-per-line treebank into something human-readable
Reads one-tree-per-line from STDIN or indicated files, reformats the trees according to a Penn standard (spreading daughters to the next line, applying indenting, etc) and prints them to STDOUT. Handy with *less* etc for spot-checking trees stored in...
KAHN/Lingua-Treebank-0.16 - 28 Aug 2008 20:08:52 UTC
tree-collapse - reads multi-line Penn trees from files or STDIN and outputs trees one per line.
Reads inflated Penn treebank-format trees, with children indented and possibly on different lines, and outputs intact trees, one tree per line with whitespace as input....
KAHN/Lingua-Treebank-0.16 - 28 Aug 2008 20:08:52 UTC
list-rewrites - reads penn treebanks, prints out all rewrites found
This program lists all rewrites in all trees presented by file or on STDIN to this script. CAVEATS The trees must be in Penn treebank format. The rewrites will not necessarily be unique; if you want them to be unique, you will have to pipe the output...
KAHN/Lingua-Treebank-0.16 - 28 Aug 2008 20:08:52 UTC